Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers, and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Mitchel Lichtman, or MGL, has been doing sabermetric research and writing for almost 25 years. He is one of the authors of The Book: Playing the Percentages in Baseball. He has consulted for several major-league teams over the years and has occasionally made a fool of himself on radio and TV. He holds a B.A. from Cornell University and a J.D. from the University of Nevada. You can check him out on Twitter at @MitchelLichtman or on his blog at www.mglbaseball.org.
If, like many of us, you’re a prolific baseball blog reader, you’ve probably heard a lot lately about the “times through the order” penalty (TTOP). For those of you who have no idea what that is, here is a quote from page 187 of The Book: Playing the Percentages in Baseball: “As the game goes on, the hitter has a progressively greater advantage over the starting pitcher.” Essentially, the more times a batter faces a pitcher during a game, the better he does at the plate.
The way the TTOP is traditionally measured is by looking at a starting pitcher’s performance using, say, wOBA against, the first time through the batting order, the second time, and so on. (Like TAv, wOBA is an all-in-one offensive rate statistic, but on the OBP scale instead of the BA scale.) Theoretically, a starter’s wOBA should be about the same for batters 1-9, and then 10-18, etc., since the pitcher is obviously the same, and in most cases the batters are more or less the same (I don’t include pitchers batting or pinch hitters). You might even think that a pitcher improves as the game goes on, as he gets thoroughly warmed up—especially on a cold night—and gets a feel for all of his pitches, at least until he perhaps enters a decline phase due to fatigue, assuming he is allowed to stay in the game that long.
But that’s not what we see, as the last letter of the acronym TTOP implies. Here are some actual numbers from The Book (p. 186, Table 81.) based on data from 1999-2002. The total sample is 469,721 PA between starting pitchers and starting lineups, not including IBB and bunts.
Times Through the Order |
wOBA |
|
1 |
163,900 |
.345 |
2 |
158,872 |
.354 |
3 |
124,603 |
.362 |
4 |
22,221 |
.354 |
As you can see, there is a significant and distinctive trend in the last column, at least through the third time through the order. Basically, batters get better and better from the first time facing a pitcher in a game to the second, and then again to the third, and then revert back to “second time” levels by the time they have seen the pitcher for the fourth time. We’ll talk about that “fourth time” anomaly in a little while.
Another thing you can clearly see is that most pitchers make it through the order at least three times, which is actually something of a modern trend. In the past, starting pitchers pitched many more complete games, but they were also taken out earlier when they were getting shelled. It is also relatively rare for a pitcher today to face the order for the fourth time. That should not be surprising, since by the fourth trip through the lineup, pitch counts are usually elevated. On average, it takes almost 100 pitches to get through the order exactly three times (the current average “pitches per PA” (P/PA) is around 3.8).
As you might expect, the pool of pitchers is not exactly the same for each TTO group, at least starting with the third time (and neither is the pool of batters). Pitchers in group three are slightly better than those in groups one and two, and the pitchers in group four are quite a bit better. Balancing this out is the fact that the quality of the batters in each group also rises slightly. Because of the disparity between the pitcher and batter pools in each group, the expected wOBA in each group is actually a little different, as you can see from the table below.
TTO |
Pitcher quality |
Batter quality |
Expected wOBA |
Obs. wOBA |
1 |
.349 |
.347 |
.353 |
.345 |
2 |
.349 |
.348 |
.353 |
.354 |
3 |
.348 |
.350 |
.354 |
.362 |
4+ |
.345 |
.351 |
.353 |
.354 |
The significant rise in observed wOBA from the first through the third times through the order is not a result of any large changes in the pitcher and batter pools in each group. For all intents and purposes, the expected wOBA is the same in all groups. Something else must be going on.
If you are wondering which group represents a pitcher’s norm, conveniently, the second time through the order is almost exactly what we would expect from the pitcher overall. That is illustrated in columns 4 and 5 in row 2 of the table above. In the second time through the order, the expected wOBA, based on the pitchers’ and batters’ overall full-season numbers, is .353, and the observed wOBA is .354, almost exactly the same.
In summary, we can say this: The first time facing the lineup, the starting pitcher has the advantage, as compared to his overall “true talent.” The second time, the battle between the pitcher and batter is roughly neutral. The third time through the order, the batter gains the advantage. The fourth time, the balance appears to be neutral again; however that may not be quite true, as we will see in a while.
Now that we’ve gotten the groundwork out of the way, let’s look at some interesting data and ask and answer some equally interesting questions. All data is now from 2000-2012. Again, pitchers batting and pinch hitters are not included.
First, we’ll look at the same data that we presented in The Book, but for 2000-2012.
TTTO |
Pitcher quality |
Batter quality |
Adj. wOBA Obs. |
1 |
.346 |
.340 |
.340 |
2 |
.345 |
.340 |
.350 |
3 |
.343 |
.343 |
.359 |
4+ |
.339 |
.346 |
.359 |
We basically see the same pattern that we found in The Book—around an 8-10-point increase each time through the order until the fourth (and later), at which point it levels out. The observed wOBA is a little higher than in The Book across all TTO groups because of the way it is calculated (no sacrifice hits—in The Book we removed all bunts). The pitcher and batter quality numbers do not have SH removed—which is why they are lower as well.
Now let’s focus in on the first inning. While the first inning usually contains only batters who are facing the starter for the first time, some crazy stuff is going on that we don’t see in the second or third innings when also facing the order for the first time. It has nothing to do with the quality of the batters faced. All the observed wOBA numbers you will see from now on (as well as in the previous table) are adjusted for the quality of the batters and pitchers faced.
First Time Through the Order |
TBF |
wOBA |
Inning one |
274,332 |
.336 |
All other innings |
258,871 |
.344 |
There seems to be something about the first inning that gives the pitcher an eight-point wOBA advantage as compared to the first time through the order in the second or third inning. Again, we might have assumed the opposite—that hitters should have the advantage, as pitchers need some more time to acclimate themselves to the mound, find out which pitches are working for them, etc. On the other hand, hitters haven’t seen any real pitching since their last game, they may have been sitting on the bench for some time, and they probably haven’t seen that particular pitcher for a while, if ever.
What happens if we split the above sample into home and away?
First Time Through the Order |
Home team batters wOBA |
Road team batters wOBA |
Inning one |
.347 |
.324 |
All other innings |
.351 |
.338 |
The first time through the order, the home team has only a four-point hitting disadvantage in the first inning, as opposed to the second or third inning, but the road team hits a whopping 14 points worse! Your guess as to why there is such a large discrepancy between the home and road team in the first inning is as good as mine. Maybe coming to the plate before playing the field is a disadvantage for the visiting hitters, similar to the DH or PH penalty. Maybe it takes the visiting starter or even the fielders more time to get used to the mound and the playing field (although the data suggests that it is a hitting problem and not a defensive one). What’s clear, however, is that the home field advantage is extremely large in the first inning, larger than in any other inning by a long shot.
What about by the second time through the order? Has this imbalance between the home and road teams disappeared or at least dissipated? Let’s look at all the TTO data split by home and road pitcher.
Times Through the Order |
Road Pitcher wOBA against |
Home Pitcher wOBA against |
1 (inning 1) |
.347 |
.324 |
1 (innings 2 and 3) |
.351 |
.338 |
1 (all innings) |
.349 |
.331 |
2 |
.355 |
.346 |
3 |
.364 |
.354 |
4 |
.362 |
.354 |
All |
.356 |
.343 |
It does appear that by the time we get to the second time through the order, the imbalance is mostly gone. The difference between the home and road wOBA the first time through the order is 18 points. The second, third, and fourth times through the order, the differences are all around nine points. One of the things to take out of this is that the home team starting pitcher derives a large portion of his home field advantage from pitching in the first inning. Relievers are not so fortunate. If you’re a pitcher and you want to pump up your stats, start all your games at home, and after you’ve faced nine batters, get the heck out of Dodge!
Let’s briefly get back to that funky fourth time through the order, when it seems that the TTOP stops dead in its tracks. Does the batter’s advantage level off by the time he’s seen the pitcher for the fourth time? Actually, not as much as it appears.
A while ago I stumbled on something interesting about what happens when a starter lasts into the ninth inning or later. The starter’s team is probably winning, of course, but the margin of victory also tends to be large. In other words, in the very late innings, if it is a one- or two-run game—or even tied—the closer or other short reliever is likely to be on the mound rather than the starter. And when the game is not close, especially in a blowout, for some reason wOBA does not do well in reflecting the losing team’s approach at the plate. Consequently, wOBA in the ninth inning or later, with a starter in the game, is artificially low. If we remove the ninth inning and later from the “fourth time through the order” data, we see the wOBA rise accordingly.
The other thing that is relevant is the temperature of the game when the lineup bats for the fourth time. In night games it is much colder, and most major league games are played at night. Let’s look at the regular TTO numbers, but this time we’ll do two things: One, we’ll include only up to the eighth inning, and two, we’ll split the data into three groups: outdoor day and night games, and indoor games.
Times Through the Order (through 8 innings only) |
wOBA |
wOBA Day Games |
wOBA Night Games |
wOBA indoor games (or roof closed in SEA and MIL) |
1 |
.340 |
.337 |
.343 |
.335 |
2 |
.350 |
.349 |
.351 |
.345 |
3 |
.359 |
.361 |
.359 |
.358 |
4+ |
.361 |
.364 |
.359 |
.366 |
Eliminating the ninth inning and later raises the wOBA the fourth time through the order by two points in all games combined. And as you can also see, in day games it rises a little more, while it stays flat in night games. In the indoor games, where temperature is not a factor, we actually see a fairly large increase from the third to the fourth times through the order—eight points. In day games, we see only a three-point jump. Maybe in the daytime the temperature decreases a little between the third and fourth times, or maybe the batters and umpires are tired and want to go home. Again, your guess is as good as mine in explaining the above patterns. Suffice it to say that once weather is removed, as well as the ninth inning and later, we do in fact see a steady TTOP all the way through to the fourth or later time through the order.
What about the quality of the pitcher? Does that affect the penalty? Are good pitchers good at least partly because they don’t suffer as extreme a penalty, and vice versa for bad pitchers?
Times Through the Order |
Good pitchers (<.320 wOBA against for that season) |
Bad pitchers (>.340 wOBA against for that season) |
1 |
.297 |
.365 |
2 |
.305 |
.376 |
3 |
.317 |
.386 |
4 |
.321 |
.387 |
Interestingly, the really good pitchers show a fairly modest penalty from the first to the second time through the order—eight points—while the bad pitchers pitch 11 points worse. However, from the second to the third time, the aces get 12 points worse and the poor pitchers, 10. These differences could easily be due to sampling error. In any case, it is clear that great pitchers are by no means immune to the dreaded TTOP. These are starters who are elite pitchers, on the average a run per nine innings better than the typical pitcher, yet by the time they face the lineup for the fourth time, they are barely .3 runs per nine above average. By the third go-around, both groups of starting pitchers, the aces and the duds, both lose about 20 points in wOBA as compared to their first go-around, and around 10-12 points as compared to their overall numbers.
During the fifth game of the World Series, several people wondered whether Jon Lester would not suffer from the typical TTOP. They used that speculation to partially defend John Farrell’s decision to let Lester hit in the top of the seventh inning and continue to pitch in the bottom of the seventh, even though he was facing the Cardinals lineup for the third time. By that time, if the TTOP was in effect, we would have expected Lester to be a slightly above-average starter rather than the roughly no. 2 starter that he normally is (notwithstanding any potential “hot hand” effects resulting from pitching a good game so far). The third time through the order, the typical penalty is around .35 runs per 9 innings compared to a starter’s overall RA9.
The evidence that the Farrell defenders gave for Lester possibly being immune to the penalty was that in his career he has not shown the typical TTOP. I looked at 2009-2012 (I don’t have the 2013 data handy), and here is what I found for Lester.
Times Through the Order |
Lester’s wOBA against |
1 |
.320 |
2 |
.327 |
3 |
.327 |
4 |
.356 |
Overall |
.326 |
We are not dealing with tremendously large sample sizes in each group, of course, so we don’t expect these numbers to be especially reliable, and it is unlikely that they would exactly mimic the pattern of the average starting pitcher. That said, Lester does show a roughly typical penalty from the first to the second time, no penalty from the second to the third, and an exceedingly large jump from the third to the fourth (the number of TBF in the fourth group is only around 165). However, before we can put any stock in the predictive nature of a player’s own patterns or deviations from the league average, we must estimate how much to regress that data toward the league mean—the typical TTO penalties.
That’s the same thing we do for platoon splits, BABIP, or even overall performance itself, like FIP, ERA, or wOBA against, when creating projections or estimating true talent. As it turns out, a pitcher’s past deviations from the league average, in terms of their TTO penalties from the first to the fourth times through the lineup, are not very predictive, much like BABIP. When I computed year-to-year correlations for all pitchers with at least 100 TBF in each “times through the order” group per season (an average of around 220 TBF per group), I got “r” values of around .03 for around 500 data points. That means that it would take around 7,100 TBF or 1,650 innings pitched (roughly eight seasons for a full-time starter) before we would regress a pitcher’s own TTOP pattern 50 percent toward that of the average starter. So unless a pitcher had a long history of a significantly larger or smaller TTOP than the average starting pitcher, we can assume that he will lose around .35 runs per nine innings the third time through the order. Keep in mind that because of the relatively small samples we are dealing with, the 95 percent confidence interval around the .03 correlation is roughly -.06 to .12.
I’m going to look at one more thing, and then I think you can truly say that you know everything about the now-famous (I hope) “times through the order” penalty. In that same World Series game, there was also some talk about the fact that Lester had thrown only 69 pitches after facing the lineup exactly twice, so maybe he wouldn’t suffer any third-time penalty—another attempt to justify Farrell’s decision to leave him in the game. After all, most starting pitchers won’t be fatigued after only 69 pitches. While that is true, the TTOP is not about fatigue. It is about familiarity. The more a batter sees a pitcher’s delivery and repertoire, the more likely he is to be successful against him. In fact, 69 pitches is not even a low number when it comes to facing the leadoff hitter for the third time. It takes an average starter about 68.4 pitches to get through the order two times (18 times 3.8, the average P/PA in MLB).
That said, even though fatigue due to elevated pitch counts is likely not much of a factor in the TTOP, the more pitches a pitcher throws each time through the order, the more the opposing batters are able to acquaint themselves with the pitcher. How much does that affect the penalty?
I looked at that in two ways: First, I looked at the number of pitches thrown going into the second, third, and fourth times through the order. I split that up into two groups—a low pitch count and a high pitch count. Here are those results. The numbers in parentheses are the average number of pitches thrown going into that “time through the order.”
Times Through the Order |
Low Pitch Count |
High Pitch Count |
1 |
.341 |
.340 |
2 |
.351 (28) |
.349 (37) |
3 |
.359 (59) |
.359 (72) |
4 |
.361 (78) |
.360 (97) |
We don’t see much difference there. In general, number of pitches thrown does not seem to be a factor in determining how much of a penalty a starter is going to suffer each time through the order.
The second, and better, way I examined this question was this: I looked only at individual batters in each group who had seen few or many pitches in their prior PA. For example, I looked at batters in their second time through the order who had seen fewer than three pitches in their first PA, and also batters who saw more than four pitches in their first PA. Those were my two groups. I did the same thing for each time through the order. Here are those results. The numbers in parentheses are the average number of pitches seen per PA so far in the game, for every batter in the group.
Times Through the Order |
Low Pitch Count each Batter |
High Pitch Count each Batter |
1 |
.340 |
.340 |
2 |
.350 (1.9) |
.365 (4.3) |
3 |
.359 (2.2) |
.361 (4.3) |
4 |
.361 (2.3) |
.353 (4.3) |
Wow! If a batter has seen more than four pitches in his first PA, he hits 25 points better the second time around. That is a huge revelation, I think.
As with the previous table, batters who’ve seen fewer than two pitches or so during their first PA still benefit by 10 points in their next PA. So the big advantage seems to come from seeing a lot of pitches, especially in the first PA. This advantage seems to disappear by the third time through the order. By this time, the “high pitch” batter has only a two-point advantage over the “low pitch” batter. The second time he has a 15-point advantage. The fourth-time numbers in the “high pitch” group probably suffer from sample size error, as the TBF are only around 3,300. In fact, if we combine the third and fourth times in the “high pitch” group, we still get a wOBA of .360. By the time batters get to the third time through the order, how many pitches they’ve seen is mostly irrelevant. But from the first to the second go-around, it seems to be huge.
Batters who are patient are indeed imparting a benefit to their team. But it is not what most people think. It is not in order to drive the starter out of the game early—against most starters, especially the poorer ones, that would actually be a bad thing for the batting team! The benefit is to the batter himself. The more pitches he sees, the better his next PA, at least from the first to the second time through the order.
Let’s recap what we learned today about the “times through the order” penalty.
- The first time through the order, pitchers pitch better than they do overall. This “first time” effect is magnified in the first inning, especially for the home pitcher.
- Starters get progressively worse as they face the lineup for the second, third, and fourth times. The fourth-time penalty gets masked in outdoor games, especially at night, and in the ninth and later innings.
- A pitcher’s career “times through the order” patterns have almost no predictive value. We can assume that all starting pitchers have roughly the same “true talent” TTOP, regardless of what they have shown in the past.
- Good and bad pitchers show around the same magnitude of TTOP. The third time through the order, all starters are expected to pitch around .35 runs per nine innings worse than they do overall.
- Pitch count does not seem to have much of an effect on the TTOP. For example, going into the third time through the order, whether a pitcher has thrown 60 or 75 pitches doesn’t seem to matter much.
- For an individual batter, the number of pitches seen makes a huge difference. The largest difference is from the first to the second time through the order. If a batter sees fewer than three pitches in his first PA, he hits 10 points better his second time at the plate. If he sees more than four pitches his first time up, he hits 25 points better on his second go-around!
As you can see, the “times through the order” penalty is a significant effect that should be incorporated into a manager’s decision about when to remove a starting pitcher. In fact, it would behoove managers and pitching coaches to be much more mindful of a starter’s “times through the order” than his pitch count. In an article I wrote two years ago about the benefit of “quick hooks,” I showed that a typical NL team could add from a half to a full win per season simply by removing a starting pitcher who is not an ace whenever he comes to bat in a high-leverage situation after pitching at least five innings, even if his replacement is a league-average reliever. Even in AL parks, where pitchers don’t bat, managers should be inclined to replace a pitcher, especially a fourth or fifth starter, as soon as he faces the order for the third time. These mediocre or worse starters are likely at or near replacement level by this time, even if they have been pitching well.
If you are watching a game and feel inclined to criticize or (less likely) praise your favorite manager, make sure that you don’t forget to consider everything you just learned about the “times through the order” penalty.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I think, as well, this understates the risks of overtaxing a bullpen and the stacked effect that can build long term. The analysis feels more relevant to the playoff context (when relievers can pitch nearly every game thanks to off days) than it does to the regular season context where managers try to avoid using relievers more than two days in a row.
In any case, Verlander's career splits (all this data are easily available on bbref):
1st time through the order: .629 OPS
2nd time through the order: .638
3rd time through the order: .706
4th time through the order: .666, in 1/5th the sample size
This is exactly the language I object to here. I don't agree that we can ignore all past production and pretend that every single pitcher is applicable to the same curve that overarchingly reflects pitchers. I say this as a sabr-loving, BP-subscribing, baseball nerd, but this to me feels strongly of ignoring the less tangible elements of pitching in favor of a broad, potentially more satisfying, conclusion. This feels just like the early days of DIPS - DIPS certainly has value and is overarchingly accurate, but can struggle on an individual level - it seems too extreme for me to accept as gospel anything that creates a hard and fast bright line rule that ignores any less mathematical analysis of player performance or game theory.
In any case, your "opinion" doesn't matter. The math speaks for itself. If all you know is a pitcher's past times through the order numbers, the math tells us that we can't use that to predict the future. If you want to argue with the math, be my guest.
" think, as well, this understates the risks of overtaxing a bullpen and the stacked effect that can build long term."
Well, I'm not advocating anything. I'm simply giving and explaining the data. What a manager wants to do with that is up to him - not me. But I think that it would behoove managers to understand this phenomena in order to make those decisions, don't you?
The main problem I have with this article is that you assume that past TTOP is the only information one can use to predict future TTOP. At least that is what is sounds like when you proclaim, "We can assume that all starting pitchers have roughly the same “true talent†TTOP". No, we can't make this assumption. All we can state is that in this specific case, if one is limited to a certain data set to project a certain skill, that data set can be largely ignored. It does not say anything about whether there can be individual pitchers with truly abnormal, sustainable TTOP, or anything about how one might identify them. There are only 150 starting pitchers at any given time. There is no need to save data processing time by adopting generalizations, when plenty of intellectual throughput exists to evaluate each player individually. The same goes for DIPS, platoon splits, home splits, etc.
And these principles need to be integrated into the managers decisions (and plans). You obviously can't just remove all of your starters after two times through the order and stack another 350 innings on your bullpen (except in the playoffs), but there are great opportunities to optimize when you use short hooks (day games and in domes) and when you let your starter ride (night games early in the season).
I'd calculated some related numbers for how batters fare in their second plate appearance, both based on total pitches faced in their first PA as well as the number of *unique pitch types* seen in their first PA.
This is what I found for 2011-Aug 2013, second PA performance per first PA unique pitches seen (not adjusted for pitcher/batter quality though):
1st PA Unique UIBB% K% wOBA BACON
1 6.9% 9.5% .312 .284
2 7.1% 9.1% .321 .289
3 7.2% 8.9% .322 .290
4+ 7.3% 8.7% .328 .291
So basically everything gets better in terms of second PA performance the more unique pitch types you see in your first PA. You walk more, strike out less, better wOBA and better Batting Average on CONtact (including HRs).
Really enjoyed the article!
I think you want to somehow separate number of unique pitches from number of pitches altogether. For all we know, you may be simply picking up the effect of number of pitchers, period, and not unique pitches.
You want to do something like, one group are batters who saw 4+ pitches but at least 3 unique pitches and the other group are batters who saw 4+ pitches but they were all the same. Or something like that where we can separate the effects.
It is also always nice (somewhat mandatory actually) to at least report (if not control for) the quality of the pitchers and batters in each group. Once you start looking at number of unique pitches thrown or seen, you could easily have substantial differences in the quality of the batters and pitchers in each group. For example, when I was looking at base stealing, I established two groups, one was pitchers who allowed a lot of steal attempts and the other was pitchers who did not. To my surprise the latter group were much better pitchers, based on wOBA against (i.e, not even considering SB/CS against). If I had not controlled for pitcher quality in my research, I would have been in trouble.
Nice work!
The IBB control is pretty clear - but there might not be enough data points. There should be enough for 4 pitch walks over a season or two. I'm not sure exactly what the numbers would be telling us for the 4 pitch walk, but it would be interesting to have.
Nicely done.
And I don't trust managers and pitching coaches to be able to figure that out in the middle of a game. They are too focused on and biased by results. For example, if we could somehow know when a pitcher was indeed tired and we allowed that pitcher to throw one more inning and he struck out the side (even tired or bad pitchers can pitch well, right?), I would be willing to bet my last dollar that a manager or pitching coach would think that he is just fine! And vice versa. If we could know that a pitcher after 103 pitches was NOT tired, and he were to give up 2 walks and a HR (that happens to non-fatigued and good pitchers, believe it or not), I would also bet my last dollar that managers and coaches would take them out and tell us that they were tired.
It would take a really geeked up front office to try to enforce such a pattern, though. And the only teams who might be willing to try to do something like that would be the desperate teams, the ones with less talent, and the results would probably not look good, even if the strategy produced better pitching numbers than expected.
Forty years from now my kids could be saying, "Heck, I remember when pitchers were 'real men' and were going six and even seven innings each start. They weren't being mollycoddled like today's four-inning wimps. Those were the days!"
On a given night, a particular pitcher may be performing so well relative to his 'true talent level', that even with the penalty, he is still better than any available reliever.
I would be interested to see further incorporation of jroegele's work involving pitch 'types' as well. PitchFX has some limitations in how well it distinguishes pitches, and we do start to risk small samples, but that may be the fundamental cause of this penalty. I also wonder how much of an effect there is if the individual batters are more or less familiar with the particular pitcher, though again, small sample sizes may mask any true results.
And how would managers and pitching coaches be able to recognize that? I submit (quite confidently) that they are so results oriented that they can't and don't. They get almost everything else wrong (sorry, but that's true), why would they get this one right? If you don't believe that they "get almost everything else wrong," please listen to the ex-players and managers who are the color commentators on TV broadcasts. They constantly spew nonsense. These are the same guys that manage teams.
See, now this is a beautiful example of what old-schoolers don't like about statheads. You've run some very interesting numbers, found and quantified an effect, and then stated it as an absolute rule and that any manager who doesn't slavish obey it is wrong.
Nevermind that this is an average built up over a larger sample, and therefore half of the performances are better than that. You state that aspects of this have "no predictive value" in specific instances, yet totally dismiss the idea that there may be other information available to the manager to be weighed as well. And then you bring in the TV guys, who while they may know about the other inputs, don't have in the booth what the real manager has in the dugout, so of course they are a good yardstick...
Great math, but pardon me for finding the conclusion a little less absolute on an individual game basis...
Just because someone is pitching a two hit shutout, it doesn't necessarily mean that they are a) pitching above their normal talent level or b) a better bet than a quality set up man or closer to pitch the last inning or two. Most good relievers are better than pretty much every starter over one inning, especially if the starter has already gone seven innings and three times through the order.
It also doesn't mean that they aren't. You're also making an assumption about the quality of relievers available at the given moment.
I agree that this is really cool work (in the aggregate), and helps explain some of what we see. This should be one club in a manager's bag, but he should not wield it to the exclusion of all others. He has other in-the-moment information that may help tell him if his starter is running on the good side of the mean today, or whether he expects his relievers to be running on the good or bad side of their norms right now.
I would like to see if it holds true for the bottom of the order guys as well as the better hitters, but that's probably a pretty big project
I also wonder if it holds true for RP'ers. A typical RP'er may only face the same hitter 3-4 times in a year, if that, so it would be much harder to test, perhaps impossible due to the smaller samples. I always wondered if Red Sox hitters familiarity with Rivera over the years had anything to due to Riveras relatively poor performance against them (albeit still elite performance).
"That means that it would take around 30,000 TBF or 7,000 innings pitched (roughly 35 years) before we would regress a pitcher’s own TTOP pattern 50 percent toward that of the average starter."
Should be, "around 7,000 TBF or around 1600 IP," which is 8 years and not 35 years. So if we have 3 years for a pitcher, we would regress his own penalties around 73% toward the mean. Of course "the mean" could be different for different kinds of pitchers. For example, pitchers with many different pitches like a Felix, may have a lower TTOP than, say, a pitcher who throws mostly fastballs - I don't know. Plus, the 95% or 99% confidence interval around that .03 correlation can be as low as no correlation at all (it is highly unlikely to be a true negative correlation) and as high as .1 or .13 or so.
Thanks to Jared Cross for picking up that error (on The Book blog).
For your wOBA data going through 2012, did you keep the same linear weights for each event as published in The Book or did you use adjusted ones for each year?
Thanks!
Jonathan